Faster Compact On-Line Lempel-Ziv Factorization
نویسندگان
چکیده
We present a new on-line algorithm for computing the Lempel-Ziv factorization of a string that runs in O(N logN) time and uses only O(N log σ) bits of working space, where N is the length of the string and σ is the size of the alphabet. This is a notable improvement compared to the performance of previous on-line algorithms using the same order of working space but running in either O(N log3 N) time (Okanohara & Sadakane 2009) or O(N log2 N) time (Starikovskaya 2012). The key to our new algorithm is in the utilization of an elegant but less popular index structure called Directed Acyclic Word Graphs, or DAWGs (Blumer et al. 1985). We also present an opportunistic variant of our algorithm, which, given the run length encoding of size m of a string of length N , computes the Lempel-Ziv factorization of the string on-line, in O ( m ·min { (log logm)(log logN) log log logN , √ logm log logm }) time and O(m logN) bits of space. 1998 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems
منابع مشابه
Computing Reversed Lempel-Ziv Factorization Online
Kolpakov and Kucherov proposed a variant of the Lempel-Ziv factorization, called the reversed Lempel-Ziv (RLZ) factorization (Theoretical Computer Science, 410(51):5365–5373, 2009). In this paper, we present an on-line algorithm that computes the RLZ factorization of a given string w of length n in O(n log n) time using O(n log σ) bits of space, where σ ≤ n is the alphabet size. Also, we introd...
متن کاملOn Tinhofer's Linear Programming Approach to Isomorphism Testing
On the complexity of master problems Emergence on decreasing sandpile models 14:35 Kosolobov Durand, Romashchenko Faster lightweight Lempel-Ziv parsing Quasiperiodicity and non-computability in tilings On the Complexity of Noncommutative Polynomial Factorization
متن کاملLempel-Ziv Factorization May Be Harder Than Computing All Runs
The complexity of computing the Lempel-Ziv factorization and the set of all runs (= maximal repetitions) is studied in the decision tree model of computation over ordered alphabet. It is known that both these problems can be solved by RAM algorithms in O(n log σ) time, where n is the length of the input string and σ is the number of distinct letters in it. We prove an Ω(n log σ) lower bound on ...
متن کاملOn the Size of Lempel-Ziv and Lyndon Factorizations
Lyndon factorization and Lempel-Ziv (LZ) factorization are both important tools for analysing the structure and complexity of strings, but their combinatorial structure is very different. In this paper, we establish the first direct connection between the two by showing that while the Lyndon factorization can be bigger than the non-overlapping LZ factorization (which we demonstrate by describin...
متن کاملar X iv : 1 21 1 . 36 42 v 2 [ cs . D S ] 18 J an 2 01 3 Simpler and Faster Lempel
We present a new, simple, and efficient approach for computing the Lempel-Ziv (LZ77) factorization of a string in linear time, based on suffix arrays. Computational experiments on various data sets show that our approach constantly outperforms the fastest previous algorithm LZ OG (Ohlebusch and Gog 2011), and can be up to 2 to 3 times faster in the processing after obtaining the suffix array, w...
متن کامل